Case Study · 2026 Live in production
Client
Redacted
Client identity under NDA · sector & metrics disclosed with permission
ClientGlobal Financial Services Firm
SectorInvestment Management · Private Markets
End usersInvestment professionals & operations staff
Built byEnvyro · 2026
Status● Live · 24 / 7

A confidential case study · internal RAG agent · entitlement-aware

How a global investment firm turned 12 years of deal memos into a 4-second answer.

A senior-analyst-grade research agent — embedded in Slack and the firm's intranet — that answers policy, precedent, and process questions from the firm's own internal corpus, without exposing a single document to a public model.

Internal RAG Slack-Native Entitlement-Aware Single-Tenant Citation-First Under NDA
0 / mo
Internal queries answered across the firm
0.1s
Avg. first-token response time
0%
Answered cleanly, no human escalation
~6 hrs / wk
Reclaimed per analyst on research lookups
01 · The Challenge

Twelve years of knowledge — findable only if you already knew where.

Deal memos, IC decks, policies, and playbooks scattered across SharePoint, Notion, and shared drives. Every senior analyst's inbox was the search engine — and the bottleneck.

01 / 04

Tribal knowledge locked in 12+ years of files

Deal memos, IC decks, policies, and playbooks scattered across SharePoint, Notion, and shared drives. Findable only if you already knew where to look.

02 / 04

Senior analysts paged for the same questions weekly

The same precedent and policy questions cycled back to the same three or four people, every week — pulling them out of the work they were hired to do.

03 / 04

Strict data residency requirements

Nothing could leave the firm's environment. Public APIs were a non-starter. Whatever the answer was, it had to live entirely inside the VPC.

04 / 04

Onboarding measured in months

New hires spent weeks learning where knowledge lived before they could actually use it. Day-one productivity was a fantasy.

02 · The Solution

A single-tenant research agent — that cites every claim it makes.

Envyro partnered with the firm to design and deploy a single-tenant, entitlement-aware RAG agent — indexing 200,000+ internal documents and serving every team through Slack and the intranet.

Every answer carries citations back to the source document and page. Every retrieval respects the user's actual access rights. Nothing the user shouldn't see ever enters the prompt.

Built by Envyro · Running inside the firm's VPC.

Hybrid retrieval over 200K+ docs

BM25 + dense vectors over deal memos, IC decks, policies, and playbooks — twelve years of institutional knowledge made searchable.

Entitlement-aware

Every query is filtered through the user's access rights before retrieval. The model literally cannot reason over a document the user isn't entitled to see.

Slack-native + intranet widget

Zero onboarding, no new app to learn. The agent lives where the work already happens — DM it, mention it, ask it inline.

Citation-first answers

Every claim links back to its source document and page. No source means no answer — and the user can verify in one click.

"
What used to mean DM'ing a partner and waiting until tomorrow now takes four seconds — and the answer comes with citations.
FS
Global Financial Services Firm
Senior leadership · live with Envyro-built platform
03 · Deployment

One platform. Every team.

A single platform deployed inside the firm's VPC, rolled out team-by-team over six weeks. No public APIs, no third-party model exposure, no shared tenancy.

1 platform
Single-tenant, in-VPC
0
Public-API dependencies
24 / 7
Always-on
1
Step 01

Corpus + entitlements

Source connectors wired into SharePoint, Notion, and shared drives. The firm's access model mirrored exactly — no shortcuts.

2
Step 02

Index + tune

Hybrid retrieval indexed. Eval suite built against real internal questions. Citation behavior tuned until it stopped guessing.

3
Step 03

Live + observed

Rolled out team-by-team. Every query, retrieval, and feedback signal logged for tuning. The system gets sharper every week.

04 · Production Data

Where every query actually lands.

A representative month — roughly 7,200 queries across the firm, the vast majority answered cleanly with citations. The remainder routed to the right human SME with full context attached.

0
Queries / month
4.1 sec
Avg first token
4 surfaces
Slack · intranet · email · API
Live
In production
Where every query lands
By volume
Representative month · ~7,200 queries · outcome distribution
~92% answered cleanly · ~8% routed with full context No hallucinated answers · every claim cited
05 · The Validation Gate

Ninety-two in a hundred answered cleanly. The rest get a warm handoff.

When the model isn't confident in a citation, it doesn't guess — it asks. The remaining 8% land in front of the right human SME with the question and partial context already attached.

Auto-answered
0%

Answered with grounded citations

Hybrid-retrieved, entitlement-filtered, and cited back to source — answer delivered in the surface the user asked from, in under five seconds.

Routed
0%

Routed to the right human SME

Surfaced to the SME with the original question, partial retrieval, and the model's hesitation reason — so the human picks up exactly where the agent stopped.

When the model isn't confident in a citation, it doesn't guess — it asks. That single decision is what makes the system safe to run firm-wide.

06 · How It Works

From question asked to cited answer returned.

A single pipeline carries every query through five stages — entitlements, retrieval, generation, citation, and feedback — in under five seconds, with full traceability at every step.

~4.1 sec
First-token latency, end to end — including entitlement resolution and retrieval.
Zero leakage
Entitlement filter runs before retrieval. The model cannot reason over docs the user can't access.
Full audit trail
Every query, retrieval, and answer logged — every citation traceable back to source.
Step 01
Question asked

Analyst pings the agent in Slack or the intranet widget. No new tool, no context-switch tax.

Step 02
Entitlements resolved

User's access rights loaded in real time. Retrieval scope is narrowed before search even runs.

Step 03
Hybrid retrieval

BM25 + dense search over the entitled subset of the corpus. Best of lexical and semantic, on the right slice.

Step 04
Grounded generation

LLM answers only from retrieved sources. No source means no answer — the system would rather say it doesn't know.

Step 05
Cited + logged

Answer returned with linked citations. Thumbs and corrections logged for ongoing tuning.

07 · Before / After

The same question — at a thousandth of the wait.

What a single research question used to mean for the firm, versus what it means now. The work shape is the same; the time-to-answer collapsed.

Before · per question
30 – 45 min
  • Hunt across SharePoint, Notion, and shared drives
  • Slack DM the partner who probably knows
  • Wait for a reply — sometimes the next day
  • Re-read three old IC decks to triangulate
  • Lose context on the actual task at hand
  • Answer only as good as the inbox you searched
After · per question
< 10 sec
  • Ask in Slack or the intranet widget
  • Entitlement-filtered retrieval runs instantly
  • Answer returned with linked citations
  • Source documents one click away
  • Feedback signal logged for the next query
  • Works at 11pm, on weekends, on day one
08 · The Impact

The firm runs the same — just faster, longer, and on the record.

Senior analyst hours come back. New hires get usable on day one. Knowledge stops leaving with people who leave. And nothing crosses the firm's boundary, ever.

i.

~6 hours per analyst per week, reclaimed

Returned to investment work, not file-hunting. Across the analyst bench, that's measurable IC throughput.

ii.

New-hire ramp cut from months to weeks

Day-one access to firm precedent and policy — without having to know which partner to ask first.

iii.

Institutional memory survives departures

Knowledge stays in the system, not in inboxes. When someone leaves, what they knew doesn't leave with them.

iv.

Zero data leaves the firm's environment

Single-tenant, in-VPC, audit-logged. The agent runs where the data already lives — no exceptions.

09 · Technology Stack

Single-tenant, entitlement-aware, and built for the firm's perimeter.

Trigger
Slack message · intranet widget query — per-question event ingestion.
Identity
Per-user entitlement resolver — access rights enforced before retrieval, not after.
Document Handling
Connectors for SharePoint, Notion, and shared drives — 200K+ docs indexed.
AI Layer
Hybrid BM25 + dense retrieval · private LLM gateway · grounded-only generation.
Surfaces
Slack-native bot · intranet widget · email digest · internal API.
Review Loop
SME routing for low-confidence answers · feedback signal captured per query.
Audit & Logging
Every query, retrieval, and answer logged · full access-rights audit trail.
10 · About Envyro

Production-grade AI agents — not demos.

Envyro is a specialized AI agency designing, deploying, and maintaining custom AI agents and pipelines that work in production. We stay on the call as your systems evolve.

SaaS · Collision Repair

Nexsyis

Shop management platform · AI email pipeline embedded into the stack.

Commercial · Maritimes

Office Interiors

Office equipment & service · bilingual voice AI for inbound calls.

Public Sector · Durham, NC

Durham County

350K+ residents · 24/7 GenAI resident support across municipal services.

Real Estate · NYSE

Veris Residential

$1.6B NYSE-listed REIT · resident-services AI across the portfolio.

Let's talk

Got a corpus nobody can find their way around?

Tell us where the knowledge sits. We'll show you what a production-grade RAG agent inside your perimeter looks like — and what the next two weeks could return.

matea@envyro.io 519 · 658 · 3579 envyro.io